suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(forcats))
library(gapminder)
library(kableExtra)
library(knitr)
library(ggplot2)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
First, filter the Gapminder data to remove observations associated with the continent of Oceania.
noOceania <- gapminder %>%
filter(continent != "Oceania")
Then remove unused factor levels.
dropOceania <- noOceania %>%
droplevels()
Let’s see what their difference in the number of rows and levels:
tibble1 <- tibble(name = c("gapminder", "noOceania", "dropOceania"), num_of_level = c(nlevels(gapminder$continent),nlevels(noOceania$continent),nlevels(dropOceania$continent)), num_of_row = c(nrow(gapminder), nrow(noOceania), nrow(dropOceania)))
knitr::kable(tibble1) %>%
kable_styling(bootstrap_options = "bordered",latex_options = "basic",full_width = F)
| name | num_of_level | num_of_row |
|---|---|---|
| gapminder | 5 | 1704 |
| noOceania | 5 | 1680 |
| dropOceania | 4 | 1680 |
We can see from the tibble above that after filtering the Oceania from gapminder, the number of level in continent doesn’t change, only the number of rows gets smaller. After dropping the unused level, the number of level changes to 4, and the number of rows also falls down to 1680.
Then let’s check the level of each data set:
levels(gapminder$continent)
## [1] "Africa" "Americas" "Asia" "Europe" "Oceania"
levels(noOceania$continent)
## [1] "Africa" "Americas" "Asia" "Europe" "Oceania"
levels(dropOceania$continent)
## [1] "Africa" "Americas" "Asia" "Europe"
So with droplevels( ) the Oceania level is dropped.
Let’s order the continent factor by the largest lifeExp in a descending order:
reOrder <- gapminder %>%
group_by(continent) %>%
summarise(maxLifeExp = max(lifeExp)) %>%
mutate(continent = fct_reorder(continent, maxLifeExp, max, .desc = TRUE))
levels(reOrder$continent)
## [1] "Asia" "Europe" "Oceania" "Americas" "Africa"
Let’s check the order by using arrange( ):
arran <- gapminder %>%
group_by(continent) %>%
summarise(largest_lifeExp = max(lifeExp)) %>%
arrange(desc(largest_lifeExp))
knitr::kable(arran) %>%
kable_styling(bootstrap_options = "bordered",latex_options = "basic",full_width = F)
| continent | largest_lifeExp |
|---|---|
| Asia | 82.603 |
| Europe | 81.757 |
| Oceania | 81.235 |
| Americas | 80.653 |
| Africa | 76.442 |
We can see it’s the same order as the result above.
Let’s see the order of level in reOrder and arran:
levels(reOrder$continent) #order of factor after using fct_reorder()
## [1] "Asia" "Europe" "Oceania" "Americas" "Africa"
levels(arran$continent) #order of factor after using arrange()
## [1] "Africa" "Americas" "Asia" "Europe" "Oceania"
In reOrder, with the using of fct_reorder( ), the order of levels changes into the expected order: a descending max lifeExp. In arran, with the using of arrange( ), the order of levels doesn’t change.
We have seen the effect of fct_reorder( ) and arrange( ) on the orderof level, now let’s see the effect on the figure. First, plot the max lifeExp in each Asian countries with fct_reorder( ):
gapAsia <- gapminder %>%
filter(continent == "Asia") %>%
group_by(country) %>%
summarise(maxLifeExp = max(lifeExp))
gapAsia %>%
ggplot(aes(maxLifeExp, fct_reorder(country, maxLifeExp))) + geom_point(aes(color = country)) + xlab("Max LifeExp") + ylab("country") + ggtitle("Max LifeExp in Asian Countries")
We can see the order of country has changed in the figure. Then let’s use arrange( ):
gapAsia %>%
arrange(maxLifeExp) %>%
ggplot(aes(maxLifeExp, country)) + geom_point(aes(color = country)) + xlab("Max LifeExp") + ylab("country") + ggtitle("Max LifeExp in Asian Countries")
This time the order of country doesn’t change because the arrange( ) can’t change the order of level in country. Then let’s use arrange( ) and fct_reorder( ) :
gapAsia %>%
arrange(maxLifeExp) %>%
ggplot(aes(maxLifeExp, fct_reorder(country, maxLifeExp))) + geom_point(aes(color = country)) + xlab("Max LifeExp") + ylab("country") + ggtitle("Max LifeExp in Asian Countries")
The order of country has changed. So if we use fct_reorder( ), or combine with arrange( ), the order of level will change in the figure. However, if we only use arrange( ), the order of level in the figure will not change.
First, create a new data frame, by filtering the Asian country with their max lifeExp more than 75 years old:
df <- gapminder %>%
filter(continent == "Asia" & lifeExp > 75) %>%
group_by(country) %>%
summarise(maxLifeExp = max(lifeExp))
knitr::kable(df) %>%
kable_styling(bootstrap_options = "bordered",latex_options = "basic",full_width = F)
| country | maxLifeExp |
|---|---|
| Bahrain | 75.635 |
| Hong Kong, China | 82.208 |
| Israel | 80.745 |
| Japan | 82.603 |
| Korea, Rep. | 78.623 |
| Kuwait | 77.588 |
| Oman | 75.640 |
| Singapore | 79.972 |
| Taiwan | 78.400 |
Then, write/read the dataframe into/from a file:
write_csv(df,"df.csv")
readDf <- read_csv("df.csv")
## Parsed with column specification:
## cols(
## country = col_character(),
## maxLifeExp = col_double()
## )
readDf
## # A tibble: 9 x 2
## country maxLifeExp
## <chr> <dbl>
## 1 Bahrain 75.6
## 2 Hong Kong, China 82.2
## 3 Israel 80.7
## 4 Japan 82.6
## 5 Korea, Rep. 78.6
## 6 Kuwait 77.6
## 7 Oman 75.6
## 8 Singapore 80.0
## 9 Taiwan 78.4
We can see the after using write_csv( )/read_csv( ) country change from factor to character
saveRDS can save a single object to the file:
saveRDS(df,"df.rds")
readRds <- readRDS("df.rds")
readRds
## # A tibble: 9 x 2
## country maxLifeExp
## <fct> <dbl>
## 1 Bahrain 75.6
## 2 Hong Kong, China 82.2
## 3 Israel 80.7
## 4 Japan 82.6
## 5 Korea, Rep. 78.6
## 6 Kuwait 77.6
## 7 Oman 75.6
## 8 Singapore 80.0
## 9 Taiwan 78.4
We can see the after using saveRDS( )/readRDS( ) country is still a factor
dput( ) writes an ASCII text representation of an R object to a file or connection, or uses one to recreate the object.
dput(df,"df.R")
readDput <- dget("df.R")
readDput
## # A tibble: 9 x 2
## country maxLifeExp
## <fct> <dbl>
## 1 Bahrain 75.6
## 2 Hong Kong, China 82.2
## 3 Israel 80.7
## 4 Japan 82.6
## 5 Korea, Rep. 78.6
## 6 Kuwait 77.6
## 7 Oman 75.6
## 8 Singapore 80.0
## 9 Taiwan 78.4
So atfer using dput( )/dget( ), country is still a factor
Let’s first look at a previous plot which show the histograms of lifeExp for each continent:
ggplot(gapminder, aes(lifeExp)) + facet_wrap( ~ continent, scales = "free_x") + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
So we can see the first thing needed to be improved is that the histogram can only give us distribution information on lifeExp, so let’s change to a point plot to show the trend of lifeExp from year 1950s to 1990s:
# filter the data
newGap <- gapminder %>%
filter(year >= 1950 & year <= 1999)
# make y scale free, change the color to a colour-blind friendly scheme, change the breaks
(point <- newGap %>%
ggplot(aes(year, lifeExp)) + facet_wrap( ~ continent, scales = "free_y") + geom_point(aes(color = lifeExp),alpha = 0.3) + labs(title = "lifeExp for 5 continent from 1950s~1990s") + scale_color_viridis_c(trans="log10",breaks = 10*(1:8)))
change a theme:
(point_new <- point + theme_minimal())
Differnces: we can see a rough trend and distributions for the lifeExp in each continent in the new plot. Also the color is changing according to the lifeExp so it becomes easier to see which continent has a highier lifeExp in a free-y-scale facet plot. In order to make the plot seems simpler, the theme change to minimal by using theme_minimal( )
But if we want to look at the accurate distribution of each year, we can use box plot:
(box_plot <- newGap %>%
ggplot(aes(year, lifeExp)) + facet_wrap( ~ continent, scales = "free_y") + geom_boxplot(fill = "blue",color = "orange",outlier.color = "blue", alpha = 0.3, aes(group = year)) + theme_minimal() + labs(title = "lifeExp for 5 continent from 1950s~1990s"))
Differnces: through a box plot, we can clearly see the minimum, first quartile, median, third quartile, maximum, as well as the trend during the years. However, we still can’t see the accurate data on the plot, so we need to convert to plotly.
For the first point plot, convert ggplot to plotly by ‘ggplotly()’:
ggplotly(point_new)
For the second box plot, convert ggplot to plotly by ‘ggplotly()’:
ggplotly(box_plot)
Unlike ggplot, plotly makes interactive, publication-quality graphs online. Readers can interact with the plot in various ways through teh tool bar above the plot. Also the data value will be shown in the window when the pointer move to the the data.
Then, let’s try plot_ly( ) to make a 3D plot:
newGap %>%
plot_ly(x = ~year,
y = ~continent,
z = ~lifeExp,
type = "scatter3d",
mode = "markers",
marker = list(size = 3.5, color = ~lifeExp, colorscale = 'Viridis'),
opacity = 0.3)
In the 3D plot, we can combine the data in 5 continent into onr plot, also by changing the view coordinate, we can also check the data in a single continent
Use ggsave( ) to explicitly save a plot to file. ggsave( ) is a convenient function for saving the last plot that displayed. So let’s first plot a graph:
(save_plot <- gapminder %>%
ggplot(aes(continent, gdpPercap)) + scale_y_log10() + geom_boxplot(aes(fill = continent),alpha = 0.5))
Then, since the ggsave( ) guesses the type of graphics device from the extension. This means the only argument you need to supply is the filename, but in order to play around with various options in ggsave( ), I will use .png format:
ggsave("save_plot_1.png")
## Saving 7 x 5 in image
Then try changing the width and height of the saving image:
ggsave("width8_height6.png", width = 8, height = 6)
change the resolution of the saving image:
ggsave("dpi_72.png", dpi = 72)
## Saving 7 x 5 in image
change the scale of the image:
ggsave("scale_0.6.png", scale = 0.6)
## Saving 4.2 x 3 in image
try writing the image to a vector format pdf:
ggsave("vector_image.pdf")
## Saving 7 x 5 in image
Although the ggsave( ) will save the last plot that displayed, when we want to save other previous image, we need to specify the image we want to save. For example, if we want to save the box_plot:
ggsave("box_plot.png", plot = box_plot)
## Saving 7 x 5 in image
After adding the plot name in ggsave, we can save the image we want.